Model-driven testing

  • 2025-08-23

Alloy단위 테스트를 조화롭게 연결하는 프로세스.


AI가 생성한 콘텐츠가 포함됨. 아래는 모두 AI가 생성한 글이다. Gemini 2.5 Pro와 대화한 내용을 바탕으로 만든 초안 Claude Opus 4.1에게 제공하고 “extended thinking” 모드를 켠 상태에서 개선된 버전을 생성하도록 했다.

Executive Summary

This manual presents a systematic approach to using Alloy as a test generation engine for Test-Driven Development. By treating Alloy models as executable specifications, we can automatically derive comprehensive test suites that catch bugs before they’re written.

Core Workflow: Model → Analyze → Generate Tests → Code → Validate → Refine

Key Principle: Every Alloy construct maps to specific types of tests. The model doesn’t just verify your design—it actively generates the test cases most likely to break your system.


Part 1: The Alloy Modeler’s Guide (The Architect)

Your role is to create a logically sound, verifiable specification that serves as both a design document and a test generation engine. Think of yourself as writing executable requirements.

1.1 Model the System’s Core Structure

Signatures and Fields define your system’s vocabulary and relationships:

sig User {
  posts: set Post,
  following: set User
}

sig Post {
  author: one User,
  comments: set Comment,
  timestamp: one Time
}

sig Comment {
  author: one User,
  content: one String
}

Deliverable for Testing: Each sig becomes a test class, each field becomes a property test.

1.2 Define Invariants with Facts

Facts are non-negotiable rules that must always hold:

fact NoSelfFollowing {
  no u: User | u in u.following
}

fact ConsistentOwnership {
  all p: Post | p in p.author.posts
}

fact NoOrphanedComments {
  all c: Comment | some p: Post | c in p.comments
}

Deliverable for Testing: Each fact generates negative tests that attempt violations.

1.3 Create Operations with Predicates

Predicates model operations and state transitions—your system’s behavior:

pred addComment[pre, post: State, u: User, p: Post, c: Comment] {
  // Preconditions
  p in pre.posts
  c not in pre.allComments
  
  // Postconditions
  post.allComments = pre.allComments + c
  post.posts = pre.posts
  p.comments' = p.comments + c
  c.author' = u
  
  // Frame conditions (what doesn't change)
  all p2: Post - p | p2.comments' = p2.comments
}

Deliverable for Testing: Each predicate generates parameterized test scenarios.

1.4 Explore Valid States with Run Commands

Run commands generate concrete, valid instances:

run ShowActiveDiscussion {
  some p: Post | #p.comments > 3
  some u: User | #u.following > 2
} for 5

Deliverable for Testing: Each instance becomes a “happy path” test with specific data.

1.5 Verify Properties with Assertions

Assertions are properties you believe must be true:

assert NoCommentWithoutPost {
  all c: Comment | some p: Post | c in p.comments
}
check NoCommentWithoutPost for 10

assert FollowingIsAcyclic {
  no u: User | u in u. #
}
check FollowingIsAcyclic for 8

Deliverable for Testing:

  • Passing assertions validate your model’s consistency
  • Failed assertions (counterexamples) become critical test cases

1.6 Find Boundary Conditions

Systematically explore different scopes to find boundaries:

// Find minimum configuration that exhibits interesting behavior
run MinimalNetwork {
  some u: User | #u.following > 0
} for 2

// Find stress scenarios
run StressTest {
  all u: User | #u.posts > 3
} for 10

Deliverable for Testing: Boundary instances become edge case tests.

1.7 Model Temporal Properties (Alloy 6+)

For systems with state evolution:

pred init[s: State] {
  no s.posts
  no s.comments
}

pred stutter[s, s': State] {
  s.posts = s'.posts
  s.comments = s'.comments
}

fact Traces {
  init[first]
  all s: State - last | 
    addPost[s, s.next] or 
    addComment[s, s.next] or 
    stutter[s, s.next]
}

assert EventuallyActiveDiscussion {
  always (some p: Post | eventually #p.comments > 5)
}

Deliverable for Testing: Temporal properties generate sequence-based integration tests.

1.8 The Complete Model Checklist

Before handoff to the programmer:

  • All check commands pass (no counterexamples found)
  • All run commands produce meaningful instances
  • Edge cases explored at minimum and maximum scopes
  • Predicates cover all major operations
  • Facts capture all critical invariants
  • Document why each constraint exists (use comments liberally)

Part 2: The Programmer’s Guide (The Builder)

Your role is to transform the model’s logical guarantees into executable tests, then write code that passes them. The model is your test oracle—it tells you not just what to test, but what the expected behavior should be.

2.1 Setting Up the Test Architecture

Create a test structure that mirrors the model:

# test_structure.py
class ModelBasedTest:
    """Base class for tests derived from Alloy model"""
    
    @classmethod
    def from_alloy_instance(cls, instance_xml):
        """Generate test data from Alloy XML output"""
        # Parse the instance and create test objects
        pass
    
    def assert_invariant(self, fact_name):
        """Verify a fact holds in current state"""
        pass

2.2 Mapping Signatures to Test Classes

➡️ From each sig in the model:

sig Post {
  author: one User,
  comments: set Comment
}

Generate class and property tests:

class TestPost(unittest.TestCase):
    def test_post_must_have_exactly_one_author(self):
        """From: 'author: one User'"""
        post = Post()
        
        # Test 'one' multiplicity
        with self.assertRaises(ValidationError):
            post.save()  # No author
        
        post.author = user1
        post.save()  # Should succeed
        
        with self.assertRaises(ValidationError):
            post.author = [user1, user2]  # Multiple authors
    
    def test_post_can_have_multiple_comments(self):
        """From: 'comments: set Comment'"""
        post = Post(author=user)
        comment1 = Comment(content="First")
        comment2 = Comment(content="Second")
        
        post.add_comment(comment1)
        post.add_comment(comment2)
        
        self.assertEqual(len(post.comments), 2)

2.3 Mapping Facts to Invariant Tests

➡️ From each fact in the model:

fact NoOrphanedComments {
  all c: Comment | some p: Post | c in p.comments
}

Generate violation tests:

class TestInvariants(unittest.TestCase):
    def test_cannot_create_orphaned_comment(self):
        """From fact: NoOrphanedComments"""
        comment = Comment(author=user, content="Orphan")
        
        # Attempt to save comment without associating with post
        with self.assertRaises(InvariantViolation) as ctx:
            comment.save()
        
        self.assertEqual(ctx.exception.message, 
                        "Comment must belong to a post")
    
    def test_cannot_follow_self(self):
        """From fact: NoSelfFollowing"""
        user = User(name="Alice")
        
        with self.assertRaises(InvariantViolation):
            user.follow(user)

2.4 Mapping Run Commands to Happy Path Tests

➡️ From each run instance:

// Instance generated by: run ShowActiveDiscussion for 5
// User0.following = {User1, User2}
// Post0.author = User0
// Post0.comments = {Comment0, Comment1, Comment2}

Generate concrete scenario tests:

class TestScenarios(unittest.TestCase):
    def test_active_discussion_scenario(self):
        """From run: ShowActiveDiscussion"""
        # Setup exact state from Alloy instance
        user0 = User(name="User0")
        user1 = User(name="User1")
        user2 = User(name="User2")
        
        user0.follow(user1)
        user0.follow(user2)
        
        post = Post(author=user0)
        comments = [
            Comment(author=user1, content="Comment0"),
            Comment(author=user2, content="Comment1"),
            Comment(author=user1, content="Comment2")
        ]
        
        for comment in comments:
            post.add_comment(comment)
        
        # Verify the state is valid and behaves correctly
        self.assertEqual(len(post.comments), 3)
        self.assertEqual(len(user0.following), 2)
        self.assertTrue(post.is_active_discussion())

2.5 Mapping Predicates to Operation Tests

➡️ From each pred in the model:

pred addComment[pre, post: State, u: User, p: Post, c: Comment] {
  p in pre.posts
  c not in pre.allComments
  post.allComments = pre.allComments + c
  c.author' = u
}

Generate parameterized operation tests:

class TestOperations(unittest.TestCase):
    @parameterized.expand([
        # Generate test cases from Alloy pred instances
        ("regular_user", "public_post", "text_comment"),
        ("author_self_comment", "own_post", "reply_comment"),
        ("follower_comment", "followed_user_post", "emoji_comment"),
    ])
    def test_add_comment_operation(self, user_type, post_type, comment_type):
        """From pred: addComment"""
        # Setup preconditions
        user = self.create_user(user_type)
        post = self.create_post(post_type)
        comment = self.create_comment(comment_type)
        
        # Verify preconditions from predicate
        self.assertIn(post, system.posts)
        self.assertNotIn(comment, system.all_comments)
        
        # Execute operation
        result = post.add_comment(user, comment)
        
        # Verify postconditions from predicate
        self.assertIn(comment, system.all_comments)
        self.assertEqual(comment.author, user)
        self.assertIn(comment, post.comments)

2.6 Mapping Assertions to Property Tests

➡️ From each counterexample found:

// Counterexample found for: assert NoCycles
// User0.following = User1
// User1.following = User2  
// User2.following = User0  -- Cycle!

Generate regression tests:

class TestProperties(unittest.TestCase):
    def test_prevent_following_cycles(self):
        """From counterexample: NoCycles assertion"""
        user0 = User("Alice")
        user1 = User("Bob")
        user2 = User("Charlie")
        
        user0.follow(user1)
        user1.follow(user2)
        
        # This would create a cycle
        with self.assertRaises(CycleDetectedError):
            user2.follow(user0)

2.7 Boundary and Edge Case Tests

➡️ From scope analysis:

class TestBoundaries(unittest.TestCase):
    def test_minimum_valid_configuration(self):
        """From: run for 2 (minimum scope that works)"""
        # Test with exactly 2 users, 1 post, 1 comment
        users = [User(f"User{i}") for i in range(2)]
        post = Post(author=users[0])
        comment = Comment(author=users[1])
        post.add_comment(comment)
        
        # System should function with minimal objects
        self.assertTrue(system.is_valid())
    
    def test_maximum_stress_scenario(self):
        """From: check for 10 (maximum tested scope)"""
        # Create maximum configuration that Alloy verified
        users = [User(f"User{i}") for i in range(10)]
        
        # Each user creates maximum posts
        for user in users:
            for i in range(10):
                post = Post(author=user)
                # Add maximum comments
                for commenter in users:
                    post.add_comment(Comment(author=commenter))
        
        # System should handle maximum verified load
        self.assertTrue(system.is_valid())
        self.assertLess(system.response_time(), 1.0)  # Performance boundary

2.8 Temporal Property Tests

➡️ From temporal assertions:

class TestTemporalProperties(unittest.TestCase):
    def test_eventually_active_discussion(self):
        """From: assert EventuallyActiveDiscussion"""
        system = System()
        system.init()
        
        # Simulate time progression
        for tick in range(100):
            system.tick()
            
            # Check if we've reached active discussion state
            if any(len(p.comments) > 5 for p in system.posts):
                break
        else:
            self.fail("System never reached active discussion state")
    
    def test_invariant_holds_across_all_transitions(self):
        """From: always (no orphaned comments)"""
        system = System()
        
        for _ in range(1000):  # Random transitions
            operation = random.choice([
                system.add_post,
                system.add_comment,
                system.delete_post,
                system.delete_comment
            ])
            
            try:
                operation(*self.random_args())
            except InvariantViolation:
                pass  # Expected for invalid operations
            
            # Invariant must hold after every transition
            self.assertTrue(system.check_no_orphaned_comments())

2.9 Test Oracle Extraction

Use Alloy’s evaluator to derive expected values:

class TestOracle:
    def extract_expectations_from_instance(self, alloy_instance):
        """
        Use Alloy evaluator to compute expected values
        Example: In Alloy evaluator, run:
          #Post.comments
          User.following
          all p: Post | p.author in p.comments.author
        """
        expectations = {
            'total_comments': alloy_eval("#Post.comments"),
            'following_graph': alloy_eval("User.following"),
            'authors_comment_own': alloy_eval("all p: Post | p.author in p.comments.author")
        }
        return expectations

2.10 Coverage Metrics

Track test coverage against the model:

class ModelCoverageReport:
    def generate_report(self):
        return {
            'signatures': {
                'total': count_sigs_in_model(),
                'tested': count_sigs_with_tests(),
                'coverage': percentage()
            },
            'facts': {
                'total': count_facts_in_model(),
                'tested': count_facts_with_violation_tests(),
                'coverage': percentage()
            },
            'predicates': {
                'total': count_predicates_in_model(),
                'tested': count_predicates_with_tests(),
                'coverage': percentage()
            },
            'assertions': {
                'total': count_assertions_checked(),
                'counterexamples_tested': count_counterexample_tests(),
                'coverage': percentage()
            },
            'boundaries': {
                'min_scope_tested': True/False,
                'max_scope_tested': True/False,
                'edge_cases': count_edge_case_tests()
            }
        }

Part 3: Advanced Techniques

3.1 Mutation Testing Alignment

Every Alloy assertion that passes represents a mutation that would break your system:

class MutationTest(unittest.TestCase):
    def test_mutations_from_alloy_assertions(self):
        """Each passing assertion = a mutation to catch"""
        mutations = [
            lambda: allow_self_following(),  # Violates NoSelfFollowing
            lambda: allow_orphaned_comments(),  # Violates NoOrphanedComments
            lambda: allow_cycles(),  # Violates NoCycles
        ]
        
        for mutation in mutations:
            with self.assertRaises(TestFailure):
                apply_mutation(mutation)
                run_test_suite()

3.2 Performance Test Generation

Use Alloy’s scope limits to design performance tests:

def test_performance_at_alloy_limits(self):
    """If Alloy handles scope N, so should our system"""
    max_scope = 10  # Maximum scope Alloy could verify
    
    # Generate load matching Alloy's maximum scope
    create_users(max_scope)
    create_posts(max_scope * max_scope)
    create_comments(max_scope ** 3)
    
    start_time = time.time()
    system.validate_all_invariants()
    duration = time.time() - start_time
    
    self.assertLess(duration, 5.0)  # Should complete in reasonable time

3.3 Debugging Failed Constraints

When implementation can’t satisfy a model constraint:

def debug_constraint_failure(self, constraint_name):
    """Understand why a constraint fails"""
    # 1. Temporarily remove constraint from model
    # 2. Run check to see what breaks
    # 3. The counterexample shows why constraint was needed
    
    counterexample = alloy_check_without_constraint(constraint_name)
    
    if counterexample:
        print(f"Constraint {constraint_name} prevents: {counterexample}")
        # Either fix implementation or refine model

3.4 Incremental Refinement Process

class ModelRefinement:
    def refine_based_on_implementation_feedback(self):
        """
        When implementation reveals model is too restrictive:
        1. Identify the problematic constraint
        2. Understand its purpose via counterexamples
        3. Refine to be less restrictive while maintaining safety
        4. Regenerate tests
        5. Verify implementation against refined model
        """
        pass

Part 4: Common Patterns and Pitfalls

4.1 Pattern: State Machine Testing

abstract sig State {}
one sig Init, Active, Terminated extends State {}

sig Process {
  var state: one State
}

pred transition[p: Process, from, to: State] {
  p.state = from
  p.state' = to
}

Maps to state transition tests covering all valid paths.

4.2 Pattern: Access Control Testing

pred canAccess[u: User, r: Resource] {
  r in u.owned or
  r in u.group.resources or
  r.public = True
}

Generates permission matrix tests for all user-resource combinations.

4.3 Pitfall: Over-Constraining

Problem: Model too restrictive, implementation can’t satisfy it. Solution: Start with minimal constraints, add incrementally.

4.4 Pitfall: Under-Testing Boundaries

Problem: Only testing within comfortable scopes. Solution: Always test at scope 2 (minimum) and maximum viable scope.


Part 5: Quick Reference

Model → Test Mapping

Alloy Construct Test Type Test Purpose
sig + fields Property tests Validate structure and multiplicities
fact Invariant violation tests Ensure rules cannot be broken
pred Operation tests Test state transitions
run instance Happy path tests Verify valid scenarios work
check counterexample Regression tests Prevent specific bugs
assert (passing) Property tests Verify desired properties hold
Scope boundaries Edge case tests Test at limits
Temporal properties Sequence tests Test behavior over time

Test Generation Checklist

  • Every sig has a test class
  • Every field has a multiplicity test
  • Every fact has a violation test
  • Every successful run has a scenario test
  • Every predicate has parameterized tests
  • Every found counterexample has a regression test
  • Minimum scope (2) tested
  • Maximum scope tested
  • Performance tested at maximum scope
  • Temporal properties have sequence tests
  • Test oracle extracted from Alloy evaluator
  • Coverage report shows >90% model coverage

Conclusion

This approach transforms Alloy from a design verification tool into a comprehensive test generation engine. The model becomes a living specification that not only proves your design is correct but also generates the tests to ensure your implementation matches that design.

Remember: The bugs Alloy finds in your model are the bugs your tests will prevent in your code.